Distributed Dynamic Failure Detection
نویسندگان
چکیده
Failure monitoring and detection phase is a critical part in providing a scalability, reliability and high availability in current distributed environment. Heartbeat style of interaction is a widely used technique. This technique is utilized for detecting a fault where it monitors the heartbeats of system resources continuously in a very short interval. However, this approach has its limitations as it requires a period of time to detect the faulty node, causing delay in the impending recovery procedures. This paper presents a fault detection mechanism and service using hybrid heartbeat mechanism and dynamic estimated time of arrival (ETA) for each heartbeat message. This technique introduces the use of index server for indexing the transaction and operates dynamic hybrid heartbeat mechanism and pinging procedure for fault detection. The evaluation outcome signifies the use of the hybrid heartbeat mechanism in reducing approximately 30% of the time taken to detect faults compared to existing techniques and provides a basis for a customizable recovery action to take place.
منابع مشابه
ENERGY AWARE DISTRIBUTED PARTITIONING DETECTION AND CONNECTIVITY RESTORATION ALGORITHM IN WIRELESS SENSOR NETWORKS
Mobile sensor networks rely heavily on inter-sensor connectivity for collection of data. Nodes in these networks monitor different regions of an area of interest and collectively present a global overview of some monitored activities or phenomena. A failure of a sensor leads to loss of connectivity and may cause partitioning of the network into disjoint segments. A number of approaches have be...
متن کاملFailure Detection in P2P-Grid System
Peer-to-peer (P2P)–Grid systems are being investigated as a platform for converging the Grid and P2P network in the construction of large-scale distributed applications. The highly dynamic nature of P2P–Grid systems greatly affects the execution of the distributed program. Uncertainty caused by arbitrary node failure and departure significantly affects the availability of computing resources an...
متن کاملFault-tolerant Mobile Agent-based Monitoring Mechanism for Highly Dynamic Distributed Networks
Thanks to asynchronous and dynamic natures of mobile agents, a certain number of mobile agent-based monitoring mechanisms have actively been developed to monitor large-scale and dynamic distributed networked systems adaptively and efficiently. Among them, some mechanisms attempt to adapt to dynamic changes in various aspects such as network traffic patterns, resource addition and deletion, netw...
متن کاملByzantine Failure Detection for Dynamic Distributed Systems
Byzantine failure detectors provide an elegant abstraction for implementing Byzantine fault tolerance. However, as far as we know, there is no general solution for this problem in a dynamic distributed system over wireless networks with unknown membership. This paper presents thus a rst Byzantine failure detector for this context. The protocol has the interesting feature to be time-free, that i...
متن کاملOptimal Rejuvenation Scheduling of Distributed Computation Based on Dynamic Programming
Recently, a complementary approach to handle transient software failures, called software rejuvenation, is becoming popular as a proactive fault management technique in operational software systems. In this study, we develop the optimal scheduling algorithms to trigger software rejuvenation in distributed computation circumstance. In particular, we focus on two different computation circumstanc...
متن کاملSelf healing distributed systems
The growing complexity of distributed systems demands for new ways of control. This work addresses self-healing in distributed environments. The term self-healing represents a quite new area of research and is used in a fairly broad way, but can be seen as dynamic fault tolerance. This work proposes generic concepts and algorithms to build self-healing systems. The detection of node failures in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JSW
دوره 9 شماره
صفحات -
تاریخ انتشار 2014